You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CloudNativePG metrics exporter opens its PostgreSQL connection as the postgres superuser via the pod-local Unix socket, then demotes the session with SET ROLE pg_monitor. SET ROLE changes only current_user; session_user remains postgres. That residual superuser identity is the foothold for the rest of the chain.
Any SQL expression evaluated inside the scrape session can invoke RESET ROLE to recover real superuser privileges, then use COPY ... TO PROGRAM to spawn an OS-level subprocess as the postgres user inside the primary pod. The READ ONLY transaction flag does not block this; it gates writes to database state, not external processes.
Two exploitation paths follow from this root cause.
A database user who owns a schema on the search_path of any scraped database can plant a shadow object whose name matches an unqualified identifier in a custom metric query. When the exporter next evaluates that query, the shadow expression executes inside the session_user = postgres scrape session, giving the attacker PostgreSQL superuser privileges and OS command execution inside the primary pod within one scrape interval (≤30 s). Exploitability requires a custom metric query that contains an unqualified relation or function reference.
Although search_path shadowing of unqualified identifiers is the most direct case, the underlying bug is that any expression evaluated inside the scrape session is a superuser code path. Other exploitable shapes include user-defined functions, operators or casts resolved during the scrape, joins or subqueries against user-owned tables and views, and index expressions or RLS policies on read-touched objects.
The pg_extensions metric shipped in default-monitoring.yaml used an unqualified current_database() call and ran against every user database (target_databases: '*'). Any non-superuser who owns a user database (including the default app role created by bootstrap.initdb) could shadow current_database() and trigger the full escalation chain against a stock CNPG deployment on the first scrape after the shadow was planted.
Combined impact
The chain yields privilege escalation from a low-privileged database role (e.g. the default app role) to PostgreSQL superuser, plus arbitrary OS command execution as the postgres user inside the primary pod, all within one scrape interval. A web application SQL injection vulnerability in an app backed by a CNPG cluster is therefore sufficient to pivot to database-pod RCE.
Who is impacted
All deployments on any supported release with default monitoring enabled are affected by Path 2.
All deployments on any supported release that use custom metric queries containing unqualified catalog references are affected by Path 1.
Multi-tenant platforms that allow customers to supply or influence custom metric query bodies are at the highest risk for Path 1.
Patches
Three separate patches address the vulnerability.
Patch 1: PR #10576 "schema-qualify catalog references in default monitoring queries and documentation samples"
Schema-qualifies all unqualified pg_catalog function and view references in the shipped default-monitoring.yaml and in documentation examples. This closes Path 2 in operator-shipped configuration and removes the unqualified-identifier attack surface from all operator-shipped metric queries. Operators who clone or copy default-monitoring.yaml into custom monitoring ConfigMaps, or have copy-pasted unqualified queries elsewhere, must re-qualify those queries themselves.
Backported to all currently supported releases:
v1.29.x (x ≥ 1)
v1.28.x (x ≥ 3)
Patch 2: "dedicated cnpg_metrics_exporter role with pg_ident.conf peer mapping"
Introduces a dedicated cnpg_metrics_exporter PostgreSQL role (granted pg_monitor, no superuser privileges) and maps it in pg_ident.conf via peer authentication on the local Unix socket, following the same pattern already used for cnpg_pooler_pgbouncer. The metrics exporter connects as this role instead of postgres, so session_user is never a superuser and RESET ROLE has no escalation effect. This eliminates the root cause entirely.
Demoting the session at the SQL level (via SET SESSION AUTHORIZATION pg_monitor) is not sufficient: the privilege check for SET SESSION AUTHORIZATION is whether the authenticated user is a superuser, not the current session_user. With the connection still authenticated as postgres, any SQL in the session can run RESET SESSION AUTHORIZATION and recover the original superuser identity. This is the same recovery primitive as RESET ROLE, one layer up. Only changing the authenticated user closes the loop.
With this change in place, the original chain breaks at every step: RESET ROLE and RESET SESSION AUTHORIZATION cannot recover superuser, and COPY ... TO PROGRAM requires a privilege pg_monitor does not grant. As defense in depth, the monitoring transaction also prepends pg_catalog to the connection's search_path, so unqualified catalog identifiers cannot resolve to user-planted shadow objects.
This patch changes the connection identity but not how queries are evaluated. Custom metric queries within pg_monitor's scope (catalog reads, pg_stat_* views, settings) continue to work without modification. Queries that previously relied on superuser-level access (reading user-owned tables not granted to cnpg_metrics_exporter, or superuser-only catalogs such as pg_authid or pg_subscription) will fail and need explicit GRANT statements to cnpg_metrics_exporter.
The role is created and maintained with PASSWORD NULL; any password set out-of-band is cleared on the next reconcile, so the role cannot be authenticated by password regardless of operator pre-creation.
For replica clusters, upgrade the source primary cluster before any replica clusters that consume from it. The cnpg_metrics_exporter role is created on the source primary and replicates downstream; a replica cluster upgraded first will scrape against a missing role until the source primary upgrades or the role is created manually (see the monitoring documentation).
The patch will be backported to all currently supported releases:
v1.29.x (x ≥ 1)
v1.28.x (x ≥ 3)
Workarounds
If upgrading immediately is not possible:
Schema-qualify all identifiers in custom metric queries. Use explicit pg_catalog. prefixes for all catalog functions and views (e.g. pg_catalog.current_database(), pg_catalog.now()). This is a partial mitigation: it closes the search_path-shadowing shape in operator- and user-supplied metric bodies, but other expression shapes (user-defined functions, operators or casts; joins or subqueries on user-owned tables and views; RLS policies on read-touched objects) remain superuser code paths until Patch 2 lands.
Restrict database ownership. Ensure only fully trusted roles own user databases in scraped clusters. The exploit requires the ability to plant an object on the metrics exporter's search_path in a scraped database, typically by owning the database (and therefore public via pg_database_owner) or by holding CREATE on a schema already reachable through search_path.
PG <15 caveat:public grants CREATE to PUBLIC by default before PostgreSQL 15, so any authenticated role in a scraped database can plant a shadow object regardless of ownership.
Limit the scope of target_databases: '*' queries. Avoid target_databases: '*' unless every database in the cluster, and every role that owns one, is fully trusted. Where possible, restrict target_databases to specific, known-safe databases.
Do not expose metric query SQL to untrusted users. Multi-tenant platforms that allow customers to supply or influence custom metric query bodies should treat this as a critical trust boundary until the architectural fix is released.
References
Fix (Patch 1): PR #10576 "schema-qualify catalog references in default monitoring queries and documentation samples"
Fix (Patch 2): "dedicated cnpg_metrics_exporter role with pg_ident.conf peer mapping"
CVE-2026-44477 / GHSA-423p-g724-fr39: metrics exporter privilege escalation: the metrics exporter no longer authenticates as the postgres superuser. It now uses a dedicated cnpg_metrics_exporter role with pg_monitor privileges only, closing a chain that let a low-privilege database user gain PostgreSQL superuser. (GHSA-423p-g724-fr39)
Upgrade impact: custom monitoring queries that read user-owned tables, or use target_databases: '*' against databases where PUBLIC CONNECT has been revoked, need explicit GRANT statements to cnpg_metrics_exporter. See "Custom query privileges and safety" and "Manually creating the metrics exporter role" in the monitoring documentation.
For replica clusters, upgrade the source primary cluster before any replica clusters that consume from it. The cnpg_metrics_exporter role is created on the source primary and replicates downstream; a replica cluster upgraded first will scrape against a missing role until the source primary upgrades. The manual-recovery section linked above also covers replica clusters.
Schema-qualified catalog references in default monitoring queries: hardened the shipped monitoring configuration and documentation samples by qualifying every pg_catalog object explicitly. Unqualified references resolve through search_path, which a database user can manipulate to shadow built-in objects. (#10576)
Discoverable SBOM and provenance attestations: SBOM and SLSA provenance attached to operator container images now follow the OCI 1.1 Referrers spec, so standard registry tooling and supply-chain scanners can discover them automatically. (#10601)
CVE remediation in github.com/jackc/pgx/v5: bumped to v5.9.2 to pick up upstream fixes for CVE-2026-33816 (memory-safety in pgproto3) and GHSA-j88v-2chj-qfwx (SQL injection via simple-protocol dollar-quoted string handling). (#10436, #10498)
Build pipeline hardening: the Go 1.26.3 bump also addresses CVE-2026-42501 (cmd/go module-checksum validation), reducing supply-chain exposure during release builds. The affected code paths are not reachable from the running operator. (#10647)
Changes
Switched TLS peer verification from VerifyPeerCertificate to VerifyConnection, which runs on every completed handshake (the former is skipped on resumed TLS 1.3 sessions). Session resumption is not enabled in CloudNativePG today, so this has no observable effect, but it future-proofs verification if session caching is introduced later. (#10478)
Fixes
Fixed a failover window where the former primary kept its primary label. If it returned during failover (for example, after a transient network partition), the -rw service kept routing to it, replicas could reconnect, and committed writes were lost to pg_rewind. The old primary is now labeled unhealthy to isolate it from service traffic during failover. (#10409)
Fixed failover not being triggered when the node hosting the primary becomes unreachable. The operator now reads the pod's Ready condition (flipped to False by the node controller when the kubelet stops reporting) instead of ContainersReady, which stays stale as True in that scenario. Combined with the spurious-failover guard (#10445), failover triggers only when Kubernetes itself marks the pod not Ready. (#10448)
Fixed spurious failovers caused by transient failures on the primary's HTTP status endpoint. (#10445)
Fixed escaping of backslashes and control characters in PostgreSQL configuration values. Previously, such characters in parameters like log_line_prefix could corrupt the configuration file or be silently stripped at runtime. (#10515)
Fixed restore_command construction to shell-quote each argument. Values such as a destinationPath containing whitespace (for example, s3://my bucket/wal) were word-split by the POSIX shell and passed to the WAL restore tool as separate arguments. (#10518)
Tightened recoveryTarget validation in the admission webhook: targetXID must now be a non-negative 32-bit integer, and targetName must be shorter than 64 bytes and free of ASCII control characters. Malformed values are rejected at admission instead of failing later during PostgreSQL recovery. (#10565)
Fixed snapshot restores failing when leftover pgsql_tmp* directories were present in the data directory. (#10447)
Fixed a deadlock occurring when PVC storage size and resource requests are changed simultaneously. (#10427)
Updated the deprecation notice for native (in-tree) Barman Cloud support to reflect that it will now be removed in CloudNativePG 1.30.0, rather than 1.29.0. Users are still encouraged to migrate to the Barman Cloud Plugin. (#10167)
Enhancements
Improved the Pooler CRD with support for granular configuration of TLS cipher suites and minimum/maximum TLS versions. This enables administrators to meet strict security compliance requirements for pooler-to-client and pooler-to-server connections. Contributed by @alex1989hu. (#9571)
Improved the reliability of major upgrades by setting BackoffLimit=0 on the upgrade job, preventing unnecessary retries of a failed pg_upgrade. The operator now automatically deletes the failed job when a user reverts the container image, allowing the cluster to restart gracefully on the original version. (#10104, #10298)
Improved role management by verifying the instance is the primary before each reconciliation cycle, avoiding unnecessary reconciliation attempts and spurious error messages on read-only replicas. (#9971)
Extended the CRD schemas for Cluster, ImageCatalog, and ClusterImageCatalog to accept the extensions, bin_path, and env fields introduced in 1.29. The operator ignores these fields on older versions, but accepting them in the schema allows users to share a single manifest across clusters running different CNPG versions. (#10131, #10387)
The operator now honors the primaryUpdateMethod when adding new PVCs to a cluster, ensuring that the rollout strategy (e.g., switchover vs. restart) is respected during storage expansion or additions. (#9720)
Refined the alpha.cnpg.io/unrecoverable annotation logic to allow it to function even on pods that have not yet reached the Ready state, facilitating the recovery of stuck instances. (#9968)
Security and Supply Chain
Security best practices integration: integrated the OpenSSF baseline scanner and added a SECURITY-INSIGHTS.yaml file to the repository to align with industry-standard security reporting. (#10054, #10062)
SLSA provenance and SBOMs: added SLSA (Supply-chain Levels for Software Artifacts) provenance to release binaries and container images. Additionally, enabled Software Bill of Materials (SBOM) generation within the GoReleaser pipeline for improved dependency transparency. (#10048, #10074)
Password leak prevention: fixed a potential security risk where PostgreSQL could leak role passwords in the logs during specific reconciliation phases. (#9950)
Changes
Updated the default PostgreSQL version to 18.3 (image 18.3-system-trixie). (#10090)
Fixes
Fixed a deadlock during operator upgrades affecting clusters using synchronous replication, where pods running the old and new operator versions computed different PostgreSQL configuration hashes, causing the uniformity check to block indefinitely and preventing both rolling updates and in-place upgrades from proceeding. (#10342)
Fixed an issue where fencing annotations could not be processed when the WAL disk was full, because the disk space check blocked the instance manager from starting. The check is now performed later in the lifecycle loop, after fencing is evaluated. (#10302)
Fixed an issue where replicas would get stuck in a Pending state if the VolumeSnapshot used for the initial bootstrap had been deleted. The operator now validates snapshot existence before use; if a snapshot is missing, it attempts to use the next available candidate or falls back to pg_basebackup. (#10192)
Prevented the "supervised primary" rollout strategy from consuming all available rollout slots, which previously caused delays in scheduled updates. Contributed by @ermakov-oleg. (#9977)
Fixed an issue where certain hot-standby parameter changes were not being correctly applied to replica clusters. (#9952)
Fixed a bug in the CNPG-I reconciler hook that could lead to skipping subsequent plugins when a "continue" result was returned. Contributed by @sharifmshaker. (#9978)
Fixed a deadlock scenario that occurred when attempting to resize a filesystem on a PVC that was not currently attached to a Pod. Contributed by @jmealo. (#9981)
Fixed webhook validation of bootstrap recovery sources to accept external clusters configured with ConnectionParameters (for pg_basebackup-based recovery). Previously, these were incorrectly rejected unless a Barman object store or CNPG-i plugin was also configured. (#10268)
Volume names for extensions and tablespaces are now prefixed to avoid naming collisions with standard cluster volumes. (#9973)
When hibernating a non-healthy cluster, the operator now reports a WaitingForHealthy condition, making the deferred hibernation state visible through cnpg status. (#10193)
Fixed fencing to work correctly even when the target pod does not exist. Fencing operates on a cluster-level annotation and should not depend on pod existence; instance name validation is now performed only in the cnpg fencing on command. (#10035)
Fixed the cluster and pooler service reconcilers to correctly handle changes to all spec fields when using the patch update strategy. The reconciler now uses RFC 7386 JSON Merge Patching, preventing cloud-provider-set fields (such as loadBalancerClass) from being inadvertently removed. (#10190, #10311)
Fixed a race condition in the deprecated in-tree Barman Cloud backup implementation affecting parallel WAL restore, where prefetched files could be read while still being downloaded, causing PostgreSQL recovery to fail with "invalid checkpoint record" errors. (#10285)
Fixed the timeline history file validation to also apply to plugin-based WAL restore. Previously, the protection introduced in #9650 only covered in-tree restores, allowing plugins to bypass the check and download future timeline history files, causing timeline mismatch errors on replicas. (#9849)
cnpg plugin:
The cnpg plugin now correctly propagates ImagePullSecrets to the pgbench Job pod template. (#10174)
Added support for Azure's DefaultAzureCredential authentication mechanism for backup and recovery operations. This can be enabled by setting azureCredentials.useDefaultAzureCredentials: true in the backup configuration, simplifying authentication in Azure environments without requiring explicit storage account keys or SAS tokens. (#9468)
Fixes
Fixed validation of PostgreSQL extension names containing underscores (e.g., pg_partman, pg_ivm). Extension names with underscores are automatically sanitized to use hyphens for Kubernetes volume names while preserving the original name in mount paths. Webhook validation prevents naming conflicts after sanitization. Contributed by @shusaan. (#9386)
Fixed a critical issue where the TimelineID in the cluster status was not reset to 1 after a major version upgrade. Because pg_upgrade initializes a new timeline, keeping the old ID (e.g., timeline 2) caused replicas to attempt to restore incompatible history files from object storage, leading to fatal "requested timeline is not a child of this server's history" errors. (#9830)
Fixed an issue where stale TLS status fields in the Pooler were not cleared after being removed from the specification. This was particularly critical when upgrading to v1.28.0, where the ServerTLS field was repurposed, causing PgBouncer to use incorrect certificates and resulting in "unsupported certificate" errors that blocked all application connectivity. The operator now explicitly clears ServerCA, ClientCA, ClientTLS, and ServerTLS status fields when they are no longer configured. (#9397)
Fixed a bug where replicas could enter a crash-loop by attempting to download timeline history files from future timelines. This occurred when stale files remained in the WAL archive from a previous cluster life, and replicas would incorrectly try to fetch them during recovery. (#9650)
Fixed a race condition in replica_cluster setups during designated primary transitions, preventing transient "no primary" states in the replica cluster. (#9601)
The backup controller now uses the unique instance session ID to detect instance manager restarts. This prevents the operator from incorrectly assuming a backup is still progressing if the underlying container has crashed and restarted, which previously led to orphaned backup objects. (#9370)
Fixed a validation gap in Azure object store configurations where the storageAccount was not required when using explicit credentials (such as a storage key or SAS token). The operator now enforces that a storage account name is provided in these cases and that connectionString is mutually exclusive with other authentication parameters. (#9604)
Optimized the deletion path so the operator begins cleaning up resources immediately when a cluster is marked for deletion. This significantly reduces the time a cluster remains in Terminating status while waiting for internal reconciliation loops. (#9555)
Fixed an issue where replication slots were not properly dropped from replicas when the feature was disabled or the cluster was reconfigured. This ensures that unused slots do not cause WAL build-up on the primary. (#9381)
Fixed an issue where imagePullSecrets were not added to the ServiceAccount created for the Pooler. Previously, these secrets were applied to the Deployment but not the SA, which caused image pull failures in restricted environments using certain security policies. (#9427)
Added a check to verify ownership before the operator deletes a PodMonitor. This prevents the operator from accidentally deleting manually managed monitoring resources that happen to share a name with expected CNPG resources. Contributed by @juliamertz. (#9340)
Fixed a bug where pg_stat_archiver metrics would continue to report stale data on standby instances after a switchover. The exporter now skips these metrics on standbys, as PostgreSQL only provides valid archiver stats on the primary. (#9411)
Clarified the interpretation of timestamp formats for recovery targetTime. Timestamps provided without an explicit timezone are now consistently interpreted as UTC. Contributed by @pchovelon. (#8937)
Fixed backup status updates to prevent "resource has been modified" errors during concurrent updates. (#9551)
Fixed event reporting to use the correct pod name when a backup pod is not found. (#9552)
Improved performance of scheduled backup operations for clusters with a very high number of historical backups. (#9489)
Fixed error handling when removing finalizers on Database objects. (#9431)
cnpg plugin:
Updated the status command to display "Disabled" when the skipWalArchiving annotation is present on a cluster. This replaces confusing "starting up" or "unknown" states when WAL archiving is intentionally bypassed. (#9709)
Fixed the logs --follow command to continue polling for new pods instead of exiting prematurely when all current log streams complete. (#9599)
Quorum-Based Failover Promoted to Stable: Promoted the quorum-based failover feature, introduced experimentally in 1.27.0, to a stable API. This data-driven failover mechanism is now configured via the spec.postgresql.synchronous.failoverQuorum field, graduating from the previous alpha.cnpg.io/failoverQuorum annotation. (#8589)
Declarative Foreign Data Management: Introduced comprehensive declarative management for Foreign Data Wrappers (FDW) by extending the Database CRD. This feature adds the .spec.fdws and .spec.servers fields, allowing you to manage FDW extensions and their corresponding foreign servers directly from the Database resource. This work was implemented by Ying Zhu (@EdwinaZhu) as part of the LFX Mentorship Program 2025 Term 2. (#7942, #8401)
Changes
Updated the default PostgreSQL version to 18.1-system-trixie. (#9178)
Updated the default PgBouncer version to 1.25.1 for new Pooler deployments. (#9367)
Enhancements
Enabled simultaneous image and configuration changes when using primaryUpdateMethod: restart, allowing you to update the container image (including PostgreSQL version or extensions) and PostgreSQL configuration settings in the same operation. Note that when using primaryUpdateMethod: switchover, image and configuration changes must still be performed separately to avoid configuration mismatches during the switchover process. (#8241)
Improved network failure detection for replica instances by setting the default tcp_user_timeout to 5 seconds. This change helps replicas detect and recover from silent network drops more quickly. Previously, replicas could wait up to 127 seconds before detecting such failures; with the new timeout, they reconnect to the primary within 5 seconds. To preserve the previous behavior, set STANDBY_TCP_USER_TIMEOUT to 0 in the operator configuration. (#9317)
Adopted standard Kubernetes recommended labels (e.g., app.kubernetes.io/name) for all resources generated by CloudNativePG (Clusters, Backups, Poolers, etc.). Contributed by @JefeDavis. (#8087)
Introduced securityContext at the pod level and containerSecurityContext for individual containers (including postgres, init, and sidecars). This provides granular control over security settings, replacing the previous cluster-wide postgres and operator user settings. Contributed by @x0ddf. (#6614)
Introduced the alpha.cnpg.io/unrecoverable=true annotation for replica pods. When applied, this annotation instructs the operator to permanently delete the instance by removing its Pod and PVCs, after which it will recreate the replica from the primary. (#8178)
Introduced a new caching layer for user-defined monitoring queries to reduce load on the PostgreSQL database. (#8003)
Enhanced PgBouncer integration by automatically setting auth_dbname to the pgbouncer database, simplifying auth setup. (#8671)
Allowed providing stage-specific pg_restore options (preRestore, postRestore, dataRestore) during database import. Contributed by @hanshal101. (#7690)
Added the PostgreSQL majorVersion to the Backup object's status for easier identification and management. (#8464)
Enhanced cluster restore to wait for all init containers to complete before starting the restore process. This ensures that backup tools running in init containers finish preparing the data before the restore begins. The implementation correctly handles Kubernetes init container sidecars by ignoring those with RestartPolicy=Always. (#9026)
Added the PGBOUNCER_IMAGE_NAME operator configuration parameter to allow overriding the default PgBouncer image. This is useful for air-gapped environments or when using internal registries. (#9232)
cnpg plugin:
Added a --timeout flag to the kubectl cnpg status command for configuring the timeout for filesystem operations such as calculating cluster size. The default remains 10 seconds but can be adjusted for large clusters where operations may take longer. (#9201)
Improved cnpg report to generate more shell-friendly file names. (#8984)
Security
Allowed providing fine-grained custom TLS configurations for PgBouncer. The Pooler CRD was extended with clientTLSSecret, clientCASecret, serverTLSSecret, and serverCASecret fields under .spec.pgbouncer. These fields enable users to supply their own certificates for both client-to-pooler and pooler-to-server connections, taking precedence over the operator-generated certificates. (#8692)
Added optional TLS support for the operator's metrics server (port 8080). This feature is opt-in and enabled by setting the METRICS_CERT_DIR environment variable, which instructs the operator to look for tls.crt and tls.key files in the specified directory. When unset, the server continues to use HTTP for backward compatibility. (#8997)
Enabled cnpg report operator to work with minimal permissions by making only the operator deployment required. All other resources (pods, secrets, config maps, events, webhooks, and OLM data) are now optional and collected on a best-efforts basis. The command gracefully handles permission errors for those resources by logging clear warnings and continuing report generation with available data, rather than failing completely. This enables least-privileged access, where users may have limited, namespace-scoped permissions. (#8982)
Fixes
Improved resilience of all probe types (liveness, readiness, and startup) to transient Kubernetes API server connectivity issues. Probes now use a caching mechanism that falls back to cached cluster definitions during brief network interruptions, preventing unnecessary pod restarts and probe failures. (#9148)
Fixed the CheckEmptyWalArchive safeguard to run correctly when restoring from a volume snapshot using CNPG-I backup/WAL plugins (e.g., plugin-barman-cloud). Previously, this check was skipped for plugin-based implementations. (#9306)
Improved error reporting when ImageCatalog retrieval fails. The operator now emits a Warning event and logs errors for all failure types, not just NotFound errors, improving visibility into configuration issues. (#9266)
Fixed TLS certificate verification issues when connecting to CNPG-I plugins by adding the cnpg.io/pluginServerName annotation. This allows customizing the DNS name used for certificate verification in environments where the plugin's certificate uses a different DNS name than the Service name. (#9222)
Fixed an issue where the instance manager controller could fail to restart after an error, reporting a "controller already exists" message. The controller now uses SkipNameValidation for subsequent initialization attempts. Contributed by @mateusoliveira43. (#9123)
Fixed incorrect WAL restore path handling in plugins when the destination path is absolute, preventing path duplication issues. Contributed by @Endevir. (#9093)
Fixed the CREATE PUBLICATION SQL generation for multi-table publications to be backward-compatible with PostgreSQL 13+. The previously generated syntax was only valid for PostgreSQL 15+ and caused syntax errors on older versions. (#8888)
Fixed backup failures in complex pod definitions by reliably selecting the postgres container by name instead of by index. Contributed by @Joda89. (#8964)
cnpg plugin:
Fixed bugs in cnpg report log collection, especially when fetching previous logs. The collector now correctly fetches previous and current logs in separate requests and gracefully handles missing previous logs (e.g., on containers with no restart history), ensuring current logs are always collected. (#8992)
:::warning This is the final release in the 1.27.x series. Users are strongly encouraged to upgrade to a newer minor version, as 1.27 is no longer supported. :::
Important changes
Updated the deprecation notice for native (in-tree) Barman Cloud support to reflect that it will now be removed in CloudNativePG 1.30.0, rather than 1.29.0. Users are still encouraged to migrate to the Barman Cloud Plugin. (#10167)
Enhancements
Improved the Pooler CRD with support for granular configuration of TLS cipher suites and minimum/maximum TLS versions. This enables administrators to meet strict security compliance requirements for pooler-to-client and pooler-to-server connections. Contributed by @alex1989hu. (#9571)
Improved the reliability of major upgrades by setting BackoffLimit=0 on the upgrade job, preventing unnecessary retries of a failed pg_upgrade. The operator now automatically deletes the failed job when a user reverts the container image, allowing the cluster to restart gracefully on the original version. (#10104, #10298)
Improved role management by verifying the instance is the primary before each reconciliation cycle, avoiding unnecessary reconciliation attempts and spurious error messages on read-only replicas. (#9971)
Extended the CRD schemas for Cluster, ImageCatalog, and ClusterImageCatalog to accept the extensions, bin_path, and env fields introduced in 1.29. The operator ignores these fields on older versions, but accepting them in the schema allows users to share a single manifest across clusters running different CNPG versions. (#10131, #10387)
The operator now honors the primaryUpdateMethod when adding new PVCs to a cluster, ensuring that the rollout strategy (e.g., switchover vs. restart) is respected during storage expansion or additions. (#9720)
Security and Supply Chain
Security best practices integration: integrated the OpenSSF baseline scanner and added a SECURITY-INSIGHTS.yaml file to the repository to align with industry-standard security reporting. (#10054, #10062)
SLSA provenance and SBOMs: added SLSA (Supply-chain Levels for Software Artifacts) provenance to release binaries and container images. Additionally, enabled Software Bill of Materials (SBOM) generation within the GoReleaser pipeline for improved dependency transparency. (#10048, #10074)
Password leak prevention: fixed a potential security risk where PostgreSQL could leak role passwords in the logs during specific reconciliation phases. (#9950)
Changes
Updated the default PostgreSQL version to 18.3 (image 18.3-system-trixie). (#10090)
Fixes
Fixed an issue where fencing annotations could not be processed when the WAL disk was full, because the disk space check blocked the instance manager from starting. The check is now performed later in the lifecycle loop, after fencing is evaluated. (#10302)
Fixed an issue where replicas would get stuck in a Pending state if the VolumeSnapshot used for the initial bootstrap had been deleted. The operator now validates snapshot existence before use; if a snapshot is missing, it attempts to use the next available candidate or falls back to pg_basebackup. (#10192)
Prevented the "supervised primary" rollout strategy from consuming all available rollout slots, which previously caused delays in scheduled updates. Contributed by @ermakov-oleg. (#9977)
Fixed an issue where certain hot-standby parameter changes were not being correctly applied to replica clusters. (#9952)
Fixed a bug in the CNPG-I reconciler hook that could lead to skipping subsequent plugins when a "continue" result was returned. Contributed by @sharifmshaker. (#9978)
Fixed a deadlock scenario that occurred when attempting to resize a filesystem on a PVC that was not currently attached to a Pod. Contributed by @jmealo. (#9981)
Fixed webhook validation of bootstrap recovery sources to accept external clusters configured with ConnectionParameters (for pg_basebackup-based recovery). Previously, these were incorrectly rejected unless a Barman object store or CNPG-i plugin was also configured. (#10268)
Volume names for extensions and tablespaces are now prefixed to avoid naming collisions with standard cluster volumes. (#9973)
When hibernating a non-healthy cluster, the operator now reports a WaitingForHealthy condition, making the deferred hibernation state visible through cnpg status. (#10193)
Fixed fencing to work correctly even when the target pod does not exist. Fencing operates on a cluster-level annotation and should not depend on pod existence; instance name validation is now performed only in the cnpg fencing on command. (#10035)
Fixed the cluster and pooler service reconcilers to correctly handle changes to all spec fields when using the patch update strategy. The reconciler now uses RFC 7386 JSON Merge Patching, preventing cloud-provider-set fields (such as loadBalancerClass) from being inadvertently removed. (#10190, #10311)
Fixed a race condition in the deprecated in-tree Barman Cloud backup implementation affecting parallel WAL restore, where prefetched files could be read while still being downloaded, causing PostgreSQL recovery to fail with "invalid checkpoint record" errors. (#10285)
Fixed the timeline history file validation to also apply to plugin-based WAL restore. Previously, the protection introduced in #9650 only covered in-tree restores, allowing plugins to bypass the check and download future timeline history files, causing timeline mismatch errors on replicas. (#9849)
cnpg plugin:
The cnpg plugin now correctly propagates ImagePullSecrets to the pgbench Job pod template. (#10174)
Added support for Azure's DefaultAzureCredential authentication mechanism for backup and recovery operations. This can be enabled by setting azureCredentials.useDefaultAzureCredentials: true in the backup configuration, simplifying authentication in Azure environments without requiring explicit storage account keys or SAS tokens. (#9468)
Fixes
Fixed validation of PostgreSQL extension names containing underscores (e.g., pg_partman, pg_ivm). Extension names with underscores are automatically sanitized to use hyphens for Kubernetes volume names while preserving the original name in mount paths. Webhook validation prevents naming conflicts after sanitization. Contributed by @shusaan. (#9386)
Fixed a critical issue where the TimelineID in the cluster status was not reset to 1 after a major version upgrade. Because pg_upgrade initializes a new timeline, keeping the old ID (e.g., timeline 2) caused replicas to attempt to restore incompatible history files from object storage, leading to fatal "requested timeline is not a child of this server's history" errors. (#9830)
Fixed a bug where replicas could enter a crash-loop by attempting to download timeline history files from future timelines. This occurred when stale files remained in the WAL archive from a previous cluster life, and replicas would incorrectly try to fetch them during recovery. (#9650)
Fixed a race condition in replica_cluster setups duri
✂ Note
PR body was truncated to here.
Configuration
📅 Schedule: (UTC)
Branch creation
At any time (no schedule defined)
Automerge
At any time (no schedule defined)
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
If you want to rebase/retry this PR, check this box
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
May 18, 2026
renovateBot
deleted the
renovate/go-github.qkg1.top-cloudnative-pg-cloudnative-pg-vulnerability
branch
May 18, 2026 08:39
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
May 18, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
May 20, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
May 20, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
Jun 3, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
Jun 3, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
Jun 8, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
Jun 8, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
Jun 9, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
Jun 9, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
Jun 19, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
Jun 19, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
Jun 24, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
Jun 24, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
Jun 25, 2026
renovateBot
changed the title
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security] - autoclosed
fix(deps): update module github.com/cloudnative-pg/cloudnative-pg to v1.28.3 [security]
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v1.25.1→v1.28.3Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
CloudNativePG's metrics exporter allows privilege escalation to PostgreSQL superuser and OS RCE
CVE-2026-44477 / GHSA-423p-g724-fr39
More information
Details
Impact
The CloudNativePG metrics exporter opens its PostgreSQL connection as the
postgressuperuser via the pod-local Unix socket, then demotes the session withSET ROLE pg_monitor.SET ROLEchanges onlycurrent_user;session_userremainspostgres. That residual superuser identity is the foothold for the rest of the chain.Any SQL expression evaluated inside the scrape session can invoke
RESET ROLEto recover real superuser privileges, then useCOPY ... TO PROGRAMto spawn an OS-level subprocess as thepostgresuser inside the primary pod. TheREAD ONLYtransaction flag does not block this; it gates writes to database state, not external processes.Two exploitation paths follow from this root cause.
Path 1: custom metric queries with unqualified identifiers (all supported releases)
A database user who owns a schema on the
search_pathof any scraped database can plant a shadow object whose name matches an unqualified identifier in a custom metric query. When the exporter next evaluates that query, the shadow expression executes inside thesession_user = postgresscrape session, giving the attacker PostgreSQL superuser privileges and OS command execution inside the primary pod within one scrape interval (≤30 s). Exploitability requires a custom metric query that contains an unqualified relation or function reference.Although
search_pathshadowing of unqualified identifiers is the most direct case, the underlying bug is that any expression evaluated inside the scrape session is a superuser code path. Other exploitable shapes include user-defined functions, operators or casts resolved during the scrape, joins or subqueries against user-owned tables and views, and index expressions or RLS policies on read-touched objects.Path 2: stock
default-monitoring.yaml(all supported releases, no custom metrics required)The
pg_extensionsmetric shipped indefault-monitoring.yamlused an unqualifiedcurrent_database()call and ran against every user database (target_databases: '*'). Any non-superuser who owns a user database (including the defaultapprole created bybootstrap.initdb) could shadowcurrent_database()and trigger the full escalation chain against a stock CNPG deployment on the first scrape after the shadow was planted.Combined impact
The chain yields privilege escalation from a low-privileged database role (e.g. the default
approle) to PostgreSQL superuser, plus arbitrary OS command execution as thepostgresuser inside the primary pod, all within one scrape interval. A web application SQL injection vulnerability in an app backed by a CNPG cluster is therefore sufficient to pivot to database-pod RCE.Who is impacted
Patches
Three separate patches address the vulnerability.
Patch 1: PR #10576 "schema-qualify catalog references in default monitoring queries and documentation samples"
Schema-qualifies all unqualified
pg_catalogfunction and view references in the shippeddefault-monitoring.yamland in documentation examples. This closes Path 2 in operator-shipped configuration and removes the unqualified-identifier attack surface from all operator-shipped metric queries. Operators who clone or copydefault-monitoring.yamlinto custom monitoringConfigMaps, or have copy-pasted unqualified queries elsewhere, must re-qualify those queries themselves.Backported to all currently supported releases:
Patch 2: "dedicated
cnpg_metrics_exporterrole withpg_ident.confpeer mapping"Introduces a dedicated
cnpg_metrics_exporterPostgreSQL role (grantedpg_monitor, no superuser privileges) and maps it inpg_ident.confvia peer authentication on the local Unix socket, following the same pattern already used forcnpg_pooler_pgbouncer. The metrics exporter connects as this role instead ofpostgres, sosession_useris never a superuser andRESET ROLEhas no escalation effect. This eliminates the root cause entirely.Demoting the session at the SQL level (via
SET SESSION AUTHORIZATION pg_monitor) is not sufficient: the privilege check forSET SESSION AUTHORIZATIONis whether the authenticated user is a superuser, not the currentsession_user. With the connection still authenticated aspostgres, any SQL in the session can runRESET SESSION AUTHORIZATIONand recover the original superuser identity. This is the same recovery primitive asRESET ROLE, one layer up. Only changing the authenticated user closes the loop.With this change in place, the original chain breaks at every step:
RESET ROLEandRESET SESSION AUTHORIZATIONcannot recover superuser, andCOPY ... TO PROGRAMrequires a privilegepg_monitordoes not grant. As defense in depth, the monitoring transaction also prependspg_catalogto the connection'ssearch_path, so unqualified catalog identifiers cannot resolve to user-planted shadow objects.This patch changes the connection identity but not how queries are evaluated. Custom metric queries within
pg_monitor's scope (catalog reads,pg_stat_*views, settings) continue to work without modification. Queries that previously relied on superuser-level access (reading user-owned tables not granted tocnpg_metrics_exporter, or superuser-only catalogs such aspg_authidorpg_subscription) will fail and need explicitGRANTstatements tocnpg_metrics_exporter.The role is created and maintained with
PASSWORD NULL; any password set out-of-band is cleared on the next reconcile, so the role cannot be authenticated by password regardless of operator pre-creation.For replica clusters, upgrade the source primary cluster before any replica clusters that consume from it. The
cnpg_metrics_exporterrole is created on the source primary and replicates downstream; a replica cluster upgraded first will scrape against a missing role until the source primary upgrades or the role is created manually (see the monitoring documentation).The patch will be backported to all currently supported releases:
Workarounds
If upgrading immediately is not possible:
Schema-qualify all identifiers in custom metric queries. Use explicit
pg_catalog.prefixes for all catalog functions and views (e.g.pg_catalog.current_database(),pg_catalog.now()). This is a partial mitigation: it closes thesearch_path-shadowing shape in operator- and user-supplied metric bodies, but other expression shapes (user-defined functions, operators or casts; joins or subqueries on user-owned tables and views; RLS policies on read-touched objects) remain superuser code paths until Patch 2 lands.Restrict database ownership. Ensure only fully trusted roles own user databases in scraped clusters. The exploit requires the ability to plant an object on the metrics exporter's
search_pathin a scraped database, typically by owning the database (and thereforepublicviapg_database_owner) or by holdingCREATEon a schema already reachable throughsearch_path.PG <15 caveat:
publicgrantsCREATEtoPUBLICby default before PostgreSQL 15, so any authenticated role in a scraped database can plant a shadow object regardless of ownership.Limit the scope of
target_databases: '*'queries. Avoidtarget_databases: '*'unless every database in the cluster, and every role that owns one, is fully trusted. Where possible, restricttarget_databasesto specific, known-safe databases.Do not expose metric query SQL to untrusted users. Multi-tenant platforms that allow customers to supply or influence custom metric query bodies should treat this as a critical trust boundary until the architectural fix is released.
References
cnpg_metrics_exporterrole withpg_ident.confpeer mapping"Severity
CVSS:4.0/AV:N/AC:L/AT:N/PR:L/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
Release Notes
cloudnative-pg/cloudnative-pg (github.com/cloudnative-pg/cloudnative-pg)
v1.28.3Compare Source
Release date: May 8, 2026
Security and Supply Chain
CVE-2026-44477/GHSA-423p-g724-fr39: metrics exporter privilege escalation: the metrics exporter no longer authenticates as thepostgressuperuser. It now uses a dedicatedcnpg_metrics_exporterrole withpg_monitorprivileges only, closing a chain that let a low-privilege database user gain PostgreSQL superuser. (GHSA-423p-g724-fr39)Upgrade impact: custom monitoring queries that read user-owned tables, or use
target_databases: '*'against databases wherePUBLIC CONNECThas been revoked, need explicitGRANTstatements tocnpg_metrics_exporter. See "Custom query privileges and safety" and "Manually creating the metrics exporter role" in the monitoring documentation.For replica clusters, upgrade the source primary cluster before any replica clusters that consume from it. The
cnpg_metrics_exporterrole is created on the source primary and replicates downstream; a replica cluster upgraded first will scrape against a missing role until the source primary upgrades. The manual-recovery section linked above also covers replica clusters.Schema-qualified catalog references in default monitoring queries: hardened the shipped monitoring configuration and documentation samples by qualifying every
pg_catalogobject explicitly. Unqualified references resolve throughsearch_path, which a database user can manipulate to shadow built-in objects. (#10576)Discoverable SBOM and provenance attestations: SBOM and SLSA provenance attached to operator container images now follow the OCI 1.1 Referrers spec, so standard registry tooling and supply-chain scanners can discover them automatically. (#10601)
CVE remediation in
github.com/jackc/pgx/v5: bumped to v5.9.2 to pick up upstream fixes forCVE-2026-33816(memory-safety inpgproto3) andGHSA-j88v-2chj-qfwx(SQL injection via simple-protocol dollar-quoted string handling). (#10436, #10498)CVE remediation in the Go runtime: built with Go 1.26.3 to pick up upstream fixes in
crypto/x509,crypto/tls,net/http, andnet(CVE-2026-32280, CVE-2026-32281, CVE-2026-33810, CVE-2026-33814, CVE-2026-33811, CVE-2026-39825). (#10462, #10647)Build pipeline hardening: the Go 1.26.3 bump also addresses CVE-2026-42501 (
cmd/gomodule-checksum validation), reducing supply-chain exposure during release builds. The affected code paths are not reachable from the running operator. (#10647)Changes
VerifyPeerCertificatetoVerifyConnection, which runs on every completed handshake (the former is skipped on resumed TLS 1.3 sessions). Session resumption is not enabled in CloudNativePG today, so this has no observable effect, but it future-proofs verification if session caching is introduced later. (#10478)Fixes
Fixed a failover window where the former primary kept its primary label. If it returned during failover (for example, after a transient network partition), the
-rwservice kept routing to it, replicas could reconnect, and committed writes were lost topg_rewind. The old primary is now labeledunhealthyto isolate it from service traffic during failover. (#10409)Fixed failover not being triggered when the node hosting the primary becomes unreachable. The operator now reads the pod's
Readycondition (flipped toFalseby the node controller when the kubelet stops reporting) instead ofContainersReady, which stays stale asTruein that scenario. Combined with the spurious-failover guard (#10445), failover triggers only when Kubernetes itself marks the pod not Ready. (#10448)Fixed spurious failovers caused by transient failures on the primary's HTTP status endpoint. (#10445)
Fixed escaping of backslashes and control characters in PostgreSQL configuration values. Previously, such characters in parameters like
log_line_prefixcould corrupt the configuration file or be silently stripped at runtime. (#10515)Fixed
restore_commandconstruction to shell-quote each argument. Values such as adestinationPathcontaining whitespace (for example,s3://my bucket/wal) were word-split by the POSIX shell and passed to the WAL restore tool as separate arguments. (#10518)Tightened
recoveryTargetvalidation in the admission webhook:targetXIDmust now be a non-negative 32-bit integer, andtargetNamemust be shorter than 64 bytes and free of ASCII control characters. Malformed values are rejected at admission instead of failing later during PostgreSQL recovery. (#10565)Fixed snapshot restores failing when leftover
pgsql_tmp*directories were present in the data directory. (#10447)Fixed a deadlock occurring when PVC storage size and resource requests are changed simultaneously. (#10427)
v1.28.2Compare Source
Release date: Mar 31, 2026
Important changes
Enhancements
Improved the
PoolerCRD with support for granular configuration of TLS cipher suites and minimum/maximum TLS versions. This enables administrators to meet strict security compliance requirements for pooler-to-client and pooler-to-server connections. Contributed by @alex1989hu. (#9571)Improved the reliability of major upgrades by setting
BackoffLimit=0on the upgrade job, preventing unnecessary retries of a failedpg_upgrade. The operator now automatically deletes the failed job when a user reverts the container image, allowing the cluster to restart gracefully on the original version. (#10104, #10298)Improved role management by verifying the instance is the primary before each reconciliation cycle, avoiding unnecessary reconciliation attempts and spurious error messages on read-only replicas. (#9971)
Extended the CRD schemas for
Cluster,ImageCatalog, andClusterImageCatalogto accept theextensions,bin_path, andenvfields introduced in 1.29. The operator ignores these fields on older versions, but accepting them in the schema allows users to share a single manifest across clusters running different CNPG versions. (#10131, #10387)The operator now honors the
primaryUpdateMethodwhen adding new PVCs to a cluster, ensuring that the rollout strategy (e.g., switchover vs. restart) is respected during storage expansion or additions. (#9720)Refined the
alpha.cnpg.io/unrecoverableannotation logic to allow it to function even on pods that have not yet reached theReadystate, facilitating the recovery of stuck instances. (#9968)Security and Supply Chain
Security best practices integration: integrated the OpenSSF baseline scanner and added a
SECURITY-INSIGHTS.yamlfile to the repository to align with industry-standard security reporting. (#10054, #10062)SLSA provenance and SBOMs: added SLSA (Supply-chain Levels for Software Artifacts) provenance to release binaries and container images. Additionally, enabled Software Bill of Materials (SBOM) generation within the GoReleaser pipeline for improved dependency transparency. (#10048, #10074)
Password leak prevention: fixed a potential security risk where PostgreSQL could leak role passwords in the logs during specific reconciliation phases. (#9950)
Changes
18.3-system-trixie). (#10090)Fixes
Fixed a deadlock during operator upgrades affecting clusters using synchronous replication, where pods running the old and new operator versions computed different PostgreSQL configuration hashes, causing the uniformity check to block indefinitely and preventing both rolling updates and in-place upgrades from proceeding. (#10342)
Fixed an issue where fencing annotations could not be processed when the WAL disk was full, because the disk space check blocked the instance manager from starting. The check is now performed later in the lifecycle loop, after fencing is evaluated. (#10302)
Fixed an issue where replicas would get stuck in a
Pendingstate if theVolumeSnapshotused for the initial bootstrap had been deleted. The operator now validates snapshot existence before use; if a snapshot is missing, it attempts to use the next available candidate or falls back topg_basebackup. (#10192)Prevented the "supervised primary" rollout strategy from consuming all available rollout slots, which previously caused delays in scheduled updates. Contributed by @ermakov-oleg. (#9977)
Fixed an issue where certain hot-standby parameter changes were not being correctly applied to replica clusters. (#9952)
Fixed a bug in the CNPG-I reconciler hook that could lead to skipping subsequent plugins when a "continue" result was returned. Contributed by @sharifmshaker. (#9978)
Fixed a deadlock scenario that occurred when attempting to resize a filesystem on a PVC that was not currently attached to a Pod. Contributed by @jmealo. (#9981)
Fixed webhook validation of bootstrap recovery sources to accept external clusters configured with
ConnectionParameters(forpg_basebackup-based recovery). Previously, these were incorrectly rejected unless a Barman object store or CNPG-i plugin was also configured. (#10268)Volume names for extensions and tablespaces are now prefixed to avoid naming collisions with standard cluster volumes. (#9973)
When hibernating a non-healthy cluster, the operator now reports a
WaitingForHealthycondition, making the deferred hibernation state visible throughcnpg status. (#10193)Fixed fencing to work correctly even when the target pod does not exist. Fencing operates on a cluster-level annotation and should not depend on pod existence; instance name validation is now performed only in the
cnpg fencing oncommand. (#10035)Fixed the cluster and pooler service reconcilers to correctly handle changes to all spec fields when using the patch update strategy. The reconciler now uses RFC 7386 JSON Merge Patching, preventing cloud-provider-set fields (such as
loadBalancerClass) from being inadvertently removed. (#10190, #10311)Fixed a race condition in the deprecated in-tree Barman Cloud backup implementation affecting parallel WAL restore, where prefetched files could be read while still being downloaded, causing PostgreSQL recovery to fail with "invalid checkpoint record" errors. (#10285)
Fixed the timeline history file validation to also apply to plugin-based WAL restore. Previously, the protection introduced in #9650 only covered in-tree restores, allowing plugins to bypass the check and download future timeline history files, causing timeline mismatch errors on replicas. (#9849)
cnpgplugin:pgbenchJob pod template. (#10174)v1.28.1Compare Source
Release date: Feb 5, 2026
Enhancements
DefaultAzureCredentialauthentication mechanism for backup and recovery operations. This can be enabled by settingazureCredentials.useDefaultAzureCredentials: truein the backup configuration, simplifying authentication in Azure environments without requiring explicit storage account keys or SAS tokens. (#9468)Fixes
Fixed validation of PostgreSQL extension names containing underscores (e.g.,
pg_partman,pg_ivm). Extension names with underscores are automatically sanitized to use hyphens for Kubernetes volume names while preserving the original name in mount paths. Webhook validation prevents naming conflicts after sanitization. Contributed by @shusaan. (#9386)Fixed a critical issue where the
TimelineIDin the cluster status was not reset to 1 after a major version upgrade. Becausepg_upgradeinitializes a new timeline, keeping the old ID (e.g., timeline 2) caused replicas to attempt to restore incompatible history files from object storage, leading to fatal "requested timeline is not a child of this server's history" errors. (#9830)Fixed an issue where stale TLS status fields in the
Poolerwere not cleared after being removed from the specification. This was particularly critical when upgrading to v1.28.0, where theServerTLSfield was repurposed, causing PgBouncer to use incorrect certificates and resulting in "unsupported certificate" errors that blocked all application connectivity. The operator now explicitly clearsServerCA,ClientCA,ClientTLS, andServerTLSstatus fields when they are no longer configured. (#9397)Fixed a bug where replicas could enter a crash-loop by attempting to download timeline history files from future timelines. This occurred when stale files remained in the WAL archive from a previous cluster life, and replicas would incorrectly try to fetch them during recovery. (#9650)
Fixed a race condition in
replica_clustersetups during designated primary transitions, preventing transient "no primary" states in the replica cluster. (#9601)The backup controller now uses the unique instance session ID to detect instance manager restarts. This prevents the operator from incorrectly assuming a backup is still progressing if the underlying container has crashed and restarted, which previously led to orphaned backup objects. (#9370)
Fixed a validation gap in Azure object store configurations where the
storageAccountwas not required when using explicit credentials (such as a storage key or SAS token). The operator now enforces that a storage account name is provided in these cases and thatconnectionStringis mutually exclusive with other authentication parameters. (#9604)Optimized the deletion path so the operator begins cleaning up resources immediately when a cluster is marked for deletion. This significantly reduces the time a cluster remains in
Terminatingstatus while waiting for internal reconciliation loops. (#9555)Fixed an issue where replication slots were not properly dropped from replicas when the feature was disabled or the cluster was reconfigured. This ensures that unused slots do not cause WAL build-up on the primary. (#9381)
Fixed an issue where
imagePullSecretswere not added to theServiceAccountcreated for thePooler. Previously, these secrets were applied to the Deployment but not the SA, which caused image pull failures in restricted environments using certain security policies. (#9427)Added a check to verify ownership before the operator deletes a
PodMonitor. This prevents the operator from accidentally deleting manually managed monitoring resources that happen to share a name with expected CNPG resources. Contributed by @juliamertz. (#9340)Fixed a bug where
pg_stat_archivermetrics would continue to report stale data on standby instances after a switchover. The exporter now skips these metrics on standbys, as PostgreSQL only provides valid archiver stats on the primary. (#9411)Clarified the interpretation of timestamp formats for recovery
targetTime. Timestamps provided without an explicit timezone are now consistently interpreted as UTC. Contributed by @pchovelon. (#8937)Fixed backup status updates to prevent "resource has been modified" errors during concurrent updates. (#9551)
Fixed event reporting to use the correct pod name when a backup pod is not found. (#9552)
Improved performance of scheduled backup operations for clusters with a very high number of historical backups. (#9489)
Fixed error handling when removing finalizers on
Databaseobjects. (#9431)cnpgplugin:Updated the
statuscommand to display "Disabled" when theskipWalArchivingannotation is present on a cluster. This replaces confusing "starting up" or "unknown" states when WAL archiving is intentionally bypassed. (#9709)Fixed the
logs --followcommand to continue polling for new pods instead of exiting prematurely when all current log streams complete. (#9599)v1.28.0Compare Source
Release date: Dec 9, 2025
Features
Quorum-Based Failover Promoted to Stable: Promoted the quorum-based failover feature, introduced experimentally in 1.27.0, to a stable API. This data-driven failover mechanism is now configured via the
spec.postgresql.synchronous.failoverQuorumfield, graduating from the previousalpha.cnpg.io/failoverQuorumannotation. (#8589)Declarative Foreign Data Management: Introduced comprehensive declarative management for Foreign Data Wrappers (FDW) by extending the
DatabaseCRD. This feature adds the.spec.fdwsand.spec.serversfields, allowing you to manage FDW extensions and their corresponding foreign servers directly from theDatabaseresource. This work was implemented by Ying Zhu (@EdwinaZhu) as part of the LFX Mentorship Program 2025 Term 2. (#7942, #8401)Changes
Updated the default PostgreSQL version to
18.1-system-trixie. (#9178)Updated the default PgBouncer version to 1.25.1 for new
Poolerdeployments. (#9367)Enhancements
Enabled simultaneous image and configuration changes when using
primaryUpdateMethod: restart, allowing you to update the container image (including PostgreSQL version or extensions) and PostgreSQL configuration settings in the same operation. Note that when usingprimaryUpdateMethod: switchover, image and configuration changes must still be performed separately to avoid configuration mismatches during the switchover process. (#8241)Improved network failure detection for replica instances by setting the default
tcp_user_timeoutto 5 seconds. This change helps replicas detect and recover from silent network drops more quickly. Previously, replicas could wait up to 127 seconds before detecting such failures; with the new timeout, they reconnect to the primary within 5 seconds. To preserve the previous behavior, setSTANDBY_TCP_USER_TIMEOUTto0in the operator configuration. (#9317)Adopted standard Kubernetes recommended labels (e.g.,
app.kubernetes.io/name) for all resources generated by CloudNativePG (Clusters, Backups, Poolers, etc.). Contributed by @JefeDavis. (#8087)Introduced
securityContextat the pod level andcontainerSecurityContextfor individual containers (includingpostgres,init, and sidecars). This provides granular control over security settings, replacing the previous cluster-widepostgresandoperatoruser settings. Contributed by @x0ddf. (#6614)Introduced the
alpha.cnpg.io/unrecoverable=trueannotation for replica pods. When applied, this annotation instructs the operator to permanently delete the instance by removing its Pod and PVCs, after which it will recreate the replica from the primary. (#8178)Introduced a new caching layer for user-defined monitoring queries to reduce load on the PostgreSQL database. (#8003)
Enhanced PgBouncer integration by automatically setting
auth_dbnameto thepgbouncerdatabase, simplifying auth setup. (#8671)Allowed providing stage-specific
pg_restoreoptions (preRestore,postRestore,dataRestore) during database import. Contributed by @hanshal101. (#7690)Added the PostgreSQL
majorVersionto theBackupobject's status for easier identification and management. (#8464)Enhanced cluster restore to wait for all init containers to complete before starting the restore process. This ensures that backup tools running in init containers finish preparing the data before the restore begins. The implementation correctly handles Kubernetes init container sidecars by ignoring those with
RestartPolicy=Always. (#9026)Added the
PGBOUNCER_IMAGE_NAMEoperator configuration parameter to allow overriding the default PgBouncer image. This is useful for air-gapped environments or when using internal registries. (#9232)cnpgplugin:Added a
--timeoutflag to thekubectl cnpg statuscommand for configuring the timeout for filesystem operations such as calculating cluster size. The default remains 10 seconds but can be adjusted for large clusters where operations may take longer. (#9201)Improved
cnpg reportto generate more shell-friendly file names. (#8984)Security
Allowed providing fine-grained custom TLS configurations for PgBouncer. The
PoolerCRD was extended withclientTLSSecret,clientCASecret,serverTLSSecret, andserverCASecretfields under.spec.pgbouncer. These fields enable users to supply their own certificates for both client-to-pooler and pooler-to-server connections, taking precedence over the operator-generated certificates. (#8692)Added optional TLS support for the operator's metrics server (port 8080). This feature is opt-in and enabled by setting the
METRICS_CERT_DIRenvironment variable, which instructs the operator to look fortls.crtandtls.keyfiles in the specified directory. When unset, the server continues to use HTTP for backward compatibility. (#8997)Enabled
cnpg report operatorto work with minimal permissions by making only the operator deployment required. All other resources (pods, secrets, config maps, events, webhooks, and OLM data) are now optional and collected on a best-efforts basis. The command gracefully handles permission errors for those resources by logging clear warnings and continuing report generation with available data, rather than failing completely. This enables least-privileged access, where users may have limited, namespace-scoped permissions. (#8982)Fixes
Improved resilience of all probe types (liveness, readiness, and startup) to transient Kubernetes API server connectivity issues. Probes now use a caching mechanism that falls back to cached cluster definitions during brief network interruptions, preventing unnecessary pod restarts and probe failures. (#9148)
Fixed the
CheckEmptyWalArchivesafeguard to run correctly when restoring from a volume snapshot using CNPG-I backup/WAL plugins (e.g.,plugin-barman-cloud). Previously, this check was skipped for plugin-based implementations. (#9306)Improved error reporting when ImageCatalog retrieval fails. The operator now emits a Warning event and logs errors for all failure types, not just
NotFounderrors, improving visibility into configuration issues. (#9266)Fixed TLS certificate verification issues when connecting to CNPG-I plugins by adding the
cnpg.io/pluginServerNameannotation. This allows customizing the DNS name used for certificate verification in environments where the plugin's certificate uses a different DNS name than the Service name. (#9222)Fixed an issue where the instance manager controller could fail to restart after an error, reporting a "controller already exists" message. The controller now uses
SkipNameValidationfor subsequent initialization attempts. Contributed by @mateusoliveira43. (#9123)Fixed incorrect WAL restore path handling in plugins when the destination path is absolute, preventing path duplication issues. Contributed by @Endevir. (#9093)
Fixed the
CREATE PUBLICATIONSQL generation for multi-table publications to be backward-compatible with PostgreSQL 13+. The previously generated syntax was only valid for PostgreSQL 15+ and caused syntax errors on older versions. (#8888)Fixed backup failures in complex pod definitions by reliably selecting the
postgrescontainer by name instead of by index. Contributed by @Joda89. (#8964)cnpgplugin:cnpg reportlog collection, especially when fetching previous logs. The collector now correctly fetches previous and current logs in separate requests and gracefully handles missing previous logs (e.g., on containers with no restart history), ensuring current logs are always collected. (#8992)Supported versions
v1.27.4Compare Source
Release date: Mar 31, 2026
:::warning This is the final release in the 1.27.x series. Users are strongly encouraged to upgrade to a newer minor version, as 1.27 is no longer supported. :::
Important changes
Enhancements
Improved the
PoolerCRD with support for granular configuration of TLS cipher suites and minimum/maximum TLS versions. This enables administrators to meet strict security compliance requirements for pooler-to-client and pooler-to-server connections. Contributed by @alex1989hu. (#9571)Improved the reliability of major upgrades by setting
BackoffLimit=0on the upgrade job, preventing unnecessary retries of a failedpg_upgrade. The operator now automatically deletes the failed job when a user reverts the container image, allowing the cluster to restart gracefully on the original version. (#10104, #10298)Improved role management by verifying the instance is the primary before each reconciliation cycle, avoiding unnecessary reconciliation attempts and spurious error messages on read-only replicas. (#9971)
Extended the CRD schemas for
Cluster,ImageCatalog, andClusterImageCatalogto accept theextensions,bin_path, andenvfields introduced in 1.29. The operator ignores these fields on older versions, but accepting them in the schema allows users to share a single manifest across clusters running different CNPG versions. (#10131, #10387)The operator now honors the
primaryUpdateMethodwhen adding new PVCs to a cluster, ensuring that the rollout strategy (e.g., switchover vs. restart) is respected during storage expansion or additions. (#9720)Security and Supply Chain
Security best practices integration: integrated the OpenSSF baseline scanner and added a
SECURITY-INSIGHTS.yamlfile to the repository to align with industry-standard security reporting. (#10054, #10062)SLSA provenance and SBOMs: added SLSA (Supply-chain Levels for Software Artifacts) provenance to release binaries and container images. Additionally, enabled Software Bill of Materials (SBOM) generation within the GoReleaser pipeline for improved dependency transparency. (#10048, #10074)
Password leak prevention: fixed a potential security risk where PostgreSQL could leak role passwords in the logs during specific reconciliation phases. (#9950)
Changes
18.3-system-trixie). (#10090)Fixes
Fixed an issue where fencing annotations could not be processed when the WAL disk was full, because the disk space check blocked the instance manager from starting. The check is now performed later in the lifecycle loop, after fencing is evaluated. (#10302)
Fixed an issue where replicas would get stuck in a
Pendingstate if theVolumeSnapshotused for the initial bootstrap had been deleted. The operator now validates snapshot existence before use; if a snapshot is missing, it attempts to use the next available candidate or falls back topg_basebackup. (#10192)Prevented the "supervised primary" rollout strategy from consuming all available rollout slots, which previously caused delays in scheduled updates. Contributed by @ermakov-oleg. (#9977)
Fixed an issue where certain hot-standby parameter changes were not being correctly applied to replica clusters. (#9952)
Fixed a bug in the CNPG-I reconciler hook that could lead to skipping subsequent plugins when a "continue" result was returned. Contributed by @sharifmshaker. (#9978)
Fixed a deadlock scenario that occurred when attempting to resize a filesystem on a PVC that was not currently attached to a Pod. Contributed by @jmealo. (#9981)
Fixed webhook validation of bootstrap recovery sources to accept external clusters configured with
ConnectionParameters(forpg_basebackup-based recovery). Previously, these were incorrectly rejected unless a Barman object store or CNPG-i plugin was also configured. (#10268)Volume names for extensions and tablespaces are now prefixed to avoid naming collisions with standard cluster volumes. (#9973)
When hibernating a non-healthy cluster, the operator now reports a
WaitingForHealthycondition, making the deferred hibernation state visible throughcnpg status. (#10193)Fixed fencing to work correctly even when the target pod does not exist. Fencing operates on a cluster-level annotation and should not depend on pod existence; instance name validation is now performed only in the
cnpg fencing oncommand. (#10035)Fixed the cluster and pooler service reconcilers to correctly handle changes to all spec fields when using the patch update strategy. The reconciler now uses RFC 7386 JSON Merge Patching, preventing cloud-provider-set fields (such as
loadBalancerClass) from being inadvertently removed. (#10190, #10311)Fixed a race condition in the deprecated in-tree Barman Cloud backup implementation affecting parallel WAL restore, where prefetched files could be read while still being downloaded, causing PostgreSQL recovery to fail with "invalid checkpoint record" errors. (#10285)
Fixed the timeline history file validation to also apply to plugin-based WAL restore. Previously, the protection introduced in #9650 only covered in-tree restores, allowing plugins to bypass the check and download future timeline history files, causing timeline mismatch errors on replicas. (#9849)
cnpgplugin:pgbenchJob pod template. (#10174)v1.27.3Compare Source
Release date: Feb 5, 2026
Enhancements
DefaultAzureCredentialauthentication mechanism for backup and recovery operations. This can be enabled by settingazureCredentials.useDefaultAzureCredentials: truein the backup configuration, simplifying authentication in Azure environments without requiring explicit storage account keys or SAS tokens. (#9468)Fixes
Fixed validation of PostgreSQL extension names containing underscores (e.g.,
pg_partman,pg_ivm). Extension names with underscores are automatically sanitized to use hyphens for Kubernetes volume names while preserving the original name in mount paths. Webhook validation prevents naming conflicts after sanitization. Contributed by @shusaan. (#9386)Fixed a critical issue where the
TimelineIDin the cluster status was not reset to 1 after a major version upgrade. Becausepg_upgradeinitializes a new timeline, keeping the old ID (e.g., timeline 2) caused replicas to attempt to restore incompatible history files from object storage, leading to fatal "requested timeline is not a child of this server's history" errors. (#9830)Fixed a bug where replicas could enter a crash-loop by attempting to download timeline history files from future timelines. This occurred when stale files remained in the WAL archive from a previous cluster life, and replicas would incorrectly try to fetch them during recovery. (#9650)
Fixed a race condition in
replica_clustersetups duriConfiguration
📅 Schedule: (UTC)
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.